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I.  Tnt induct  ion 

In  ii  recent  paper  (Killworlh  and  Bernard,  1970,  hereafter  referred  t.i  as 
HCW)  vt;  bee, an  mi  experimental  investigation  o r  social  structure.  For  our  pur¬ 
poses,  understanding  social  : ;  tructure  requires  two  essentially  different  kinds 
of  information.  First,  ve  need  to  know,  on  average',  how  many  people  are  known 
to  each  person  in  n  croup  (such  ns  the  U.C.),  and  who  they  are.  This  would 
provide  n  deseri pti on  of  social  structure.  Second,  wo  need  to  know  how  people 
think  they  are  related  to  the  people  they  know.  This  would  provide  on  explana¬ 
tion  of  the  description.  The  small-world  literature; ,  dating  from  Pool  and 
Kochen  in  19i.’0  (but  published  in  1970)  and  Milgram  (1967)  has  advanced  our 
knowledge  of  the  numbers  of  people  in  a  person's  network,  and  of  the  degree  of 
connectedness  of  individual  networks.  However,  except  for  some  subtle  work  by 
Lin  et  al.  (1970),  there  lias  been  very  .little  investigation  of  how  it  is  that 
individuals  perceive  that  they  are  connected  to  the  people  in  their  network. 

(For  a  review  of  the  small-world  literature  to  date,  see  Bernard  and  Kill worth, 
1979. ) 

This  led  us,  in  RSW,  to  investigate  experimentally  how  and  why  people 
think  they  know  each  other.  In  our  experiment  (the  reverse  small-world)  vc 
presented  98  informant?  (''starters"  in  the  literature)  with  a  long  list  of 
fictional  people  ("targets").  For  each  target,  we  provided  some  basic  infor¬ 
mation:  name,  race /ethnicity ,  location  aid  occupation.  Starters  provided  the 
name  of  a  choice  whom  they  felt  would  be  most  likely  to  know  the  target,  or  to 
know  someone  who  might  know  the  target,  and  so  on.  In  other  words,  informants 
provided  the  name  of  someone  in  their  own  network  who  would  serve  as  the  first 
link  in  a  small-world  chain  to  the  target.  Informants  also  provided  some  in¬ 
formation  about,  their  choices.  They  told  us  their  relationship  with  the  choice 
(relative,  friend,  or  acquaintance),  Rnd  they  checked  a  list  of  reasons  for 
selection  of  a  choice.  The  reasons  were  location,  occupation,  race/ethnicity 
and  "other."  For  example,  if  an  informant  said  a  choice  was  picked  on  the 
basis  of  location,  then  something  about  the  location  of  the  target  and/or  the 
choice  were  somehow  connected  in  the  informant's  mind. 

Overwhelmingly,  location  and  occupation  were  the  important  reasons  for 
choice.  Characteristics  of  informants  (except  for  their  sex)  had  little  effect 
on  the  type  of,  or  reasons  for  choices.  However,  characteristics  of  targets 
were  highly  correlated  with  both  type  of,  and  reasons  for  choices.  For  example, 
the  most  likely  reason  for  choosing  an  intermediary  for  a  given  target  could  be 
predicted  81?  of  the  time,  bused  on  the  target's  occupation  and  distance  from. 
Morgantown,  West  Virginia,  where  the  experiment  took  place. 

HCW  yielded  a  lot  of  valuable  information,  but  had  two  serious  shortcom¬ 
ings.  First,  we  had  very  little  information  about  the  choices.  We  knew  their 
names  (and  thus,  in  most  instances ,thei r fonder ) ,  and  whether  they  were  relatives 
or  friends  of  the  informant..  Second,  the  HCW  instrument  was  closed-ended;  it 
provided  only  a  few  selected  pieces  of  information  about  each  target  (as  in 
traditional,  small-world  experiments),  and  forced  informants  to  choose  inter¬ 
mediaries  and  provide  reasons  for  their  choices,  based  solely  on  these  pieces 
of  information. 

Informants '  comments  about  the  HCW  experiment  revealed  both  an  occasional 
need  for  more  infer  '’.at  ion ,  and  that  a  connection  between  u  choice  and  target 
could  be  very  indirect.  For  example,  some  informants  asked  about-  the  religion 


of  th*1  targets;  many  informants  wanted  to  know  the  sex  of  targets  whose  exotic 
or  foreign  names  concealed  this  piece  of  information. 

Quite  often,  choices  selected  on  the  basis  of  location  did  not  live  (and 
hud  never  lived)  anywhere  near  the  target.  Nonetheless,  on  these  occasions, 
informants  insisted  that  the  choice  was  associated  with  the  target's  location. 

The  choice's  children,  for  example,  might  have  gone  to  college  in  the  target's 
home  state. 

The  comments  by  our  informants  regarding  the  indirectness  of  such  associa¬ 
tions  were  convincing.  We  attempted  to  build  both  direct  and  indirect  associa¬ 
tions  into  a  model  of  the  process  by  which  informants  made  their  choices 
(Killvorth  and  Bernard,  1979).  In  order  to  test  this  model,  we  assumed  that, 
each  link  in  a  small-world  chain  belongs  to  one  of  u  discrete  set  of  classes  or 
states  in  a  Markov  process.  (We  do  not  assume  that  the  decision-making  process 
is  Markovian,  only  that  the  mechanics  whereby  the  next  choice  is  made  are  inde¬ 
pendent  of  the  history  of  the  small-world  chain.)  Many  of  the  transition  prob¬ 
abilities  had  to  be  guessed,  lacking  data  about  them,  or  even  confirmation  that 
all  the  states  in  oiu-  model  existed.  It  could  be  argued,  therefore,  that  the 
good  fit  between  the  model's  predictions  and  known  facts  generated  by  small - 
world  experiments  was  fortuitous  or  self-tuned. 

In  order  to  improve  the  credibility  of  such  a  model,  we  need  to  know  what, 
if  any,  information  informants  need  about  a  target  (aside  from  location,  occu¬ 
pation,  sex,  and  race)  to  make  their  best  choice.  And  we  need  to  know  how 
informants  actually  make  theifr  choice,  once  they  are  armed  with  a  collection  of 
facts  about  a  target.  In  order  to  study  this,  an  open-ended  reverse  small-world 
experiment  was  conducted.  We  consider  this  a  member  of  a  genre  we  call  INI) MX, 
or  "informant  defined  experiment."  The  idea  is  to  study  social  structure  experi¬ 
mentally,  but  to  allow  the  subjects  of  the  study  to  define  the  information  which 
is  collected. 

We  turn  now  to  a  description  of  the  experiment,  followed  by  a  discussion 
about  the  coding  of  the  data.  The  analysis  of  the  data  is  organized  as  follows. 
There  is  a  natural  division  into  problems  connected  with  one  or  more  of:  ques¬ 
tions  asked,  choices  made,  reasons  given,  together  with  information  about  in¬ 
formants  and  targets.  These  topics  are  analyzed  first  singly,  then  in  combina¬ 
tion  with  one  another,  where  appropriate,  in  order  to  find  out  the  relationships 
between  the  variables. 


II.  The  Informant- defined  Reverse  Small-World  /.'.yperi i;> xnt 

A  list  of  50  mythical  targets  was  constructed.  Mach  target  was  given  a 
name  (and,  therefore,  gender),  an  occupation,  a  location,  and  a  racial  identity, 
us  in  traditional  small-world  experiments.  Occupations  were  selected  rrom  the 
Duncan  scale  (Duncan,  1901)  to  represent  u  cross-section  of  life  itt  the  U.il. 
Three  housewives  were  included,  and  were  unsigned  husbands  with  occupations);' 
three  students  were  included,  and  were  assigned  belli  fields  of  study  and  part- 
time  Jobs;  three  retired  persons  were  included,  but  were  assigned  occupnl i cur- 
prior  to  their  retirement. 

Locut  ion  was  rather  more  complicated.  We  divided  the  ll.b.  into  six  cat¬ 
egories  of  location:  1)  near-urban  (i.o.,  Morgantown,  WV);  2 )  near  rural  (i."., 
the  surrounding  county);  0  "n<aiu:."  urban  (i.o.,  cities'  within  a  -".'0  mile 
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radius  of  Morgantown ) ;  It)  medium  rural ;  5)  "far"  urban  (i.e,,  cities  further  than 
250  miles  from  Morgantown);  and  6)  far-rural.  The  first  two  categories  were  as¬ 
signed  five  targets  each ;  the  other  four  were  assigned  10  targets  each.  Five 
Black  targets  and  1<5  White  targets  were  defined.  Twenty- five  males  and  25  females 
were  included. 

In  addition  to  these  usual  identifiers  in  small-world  experiment  a,  we  as¬ 
signed  some  additional  information  to  the  targets,  based  on  informants'  comments, 
during  RSW.  Targets  were  assigned  ages.,  ranging  from  20-70  years;  a  religion;  an 
education  level,  runging  from  grade  school  to  graduate  degree  (in  six  gradations); 
and  a  marital  statua.  Table  1  summarises  the  socioeconomic  characteristics  ini¬ 
tially  assigned  to  the  50  targets. 

We  explained  the  reverse  small-world  procedure  to  a  group  of  six  pretest  sub¬ 
jects,  and  asked  them  to  select  a  choice  for  each  target.  However,  they  were 
given  no  information  whatever  about  any  target  —  not  even  a  target’s  name.  In¬ 
formants  were  told  that  they  could  ask  for  any  information  they  felt  they  needed 
about  any  target  in  order  to  make  a  choice  of  intermediary.  This  pretesting 
revealed  that  targets'  organisational  affiliations  and  hobbies  were  frequently 
requested  by  informants.  The  instrument  was  modified  and  targets  were  assigned 
a  maximum  of  five  hobbies  and  five  organisations  each.  Several  pretest  informant.*; 
asked  whether  targets  were  active  or  not  in  religion;  and  whether  targets  had 
children,  how  many,  and  of  what  ages.  This  information  was  added  to  the  "personal 
history"  of  the  targets. 

Fifty  informants  provided  us  with  the  data  reported  in  this  paper.  Inform¬ 
ants  were  solicited  by  advertising,  and  were  offered  $20  each  for  their  partici¬ 
pation.  Interviews  lasted,  on  average,  2.5  hours.  Table  2  summarises  some 
characteristics  of  our  informants. 


The  reverse  small-world  procedure  was  explained  to  informants.  We  told  them 
that  we  had  complete  life  histories  of  50  people  from  around  the  U.S. ,  but  that 
targets'  names  and  characteristics  had  been  shuffled  in  order  to  protect  anonymity 
Targets  were  presented  to  informants  in  random  order.  (We  shall  use  the  term 
"sequence"  to  mean  when,  from  one  to  fifty,  a  target  was  presented  to  an  inform¬ 
ant.)  Informants  were  to  ask  us  questions  about  each  target,  until  they  felt  able 
to  make  a  choice.  Informants  were  asked  initially  to  explore  any  avenue  they 
felt  might  be  helpful,  and  to  eliminate  questions  they  found  to  be  of  no  help  ss 
they  vent  along.  Many  informants  had  difficulty  in  comprehending  the  task  of 
matching  n  network  member  to  a  target.  These  people  had  to  be  taught  how  to 
"play  the  game."  We  used  non- explicit  examples  for  illustration,  in  order  to 
avoid  biasing  the  informant:  "After  you  have  asked  questions  you  will  havo  a 
set  of  information  about  the  target.  Try  to  think  of  associations  (whatever  you 
think  an  'association'  is)  between  the  target  and  a  friend  or  a  relative,  or  an 
acquaintance  of  yours.  You  will  probably  want  to  pick  the  person  you  know  who 
is  somehow  'associated,'  by  your  definition,  with  the  target,"  With  a  few  in¬ 
formants  it  was  even  necessary  to  illustrate  tin*  concept  of  the  small- world  prob¬ 
lem  graphically.  This  was  simply  not  an  easy  experiment  to  conduct i  in  collect 
ing  these  data  we  may  have  channeled  our  informant  s'  behavior  in  subtle  way: 
which  wo  have  no  way  to  control  for. 


Figure  1  shows  how  informants  rapidly  adjusted  to  the  experimental  proo. 
dure,  usually  in  the  first  seven  target  .  By  about  the  ei.--.hth  target,  inf 
ant-  nettled  down  to  ashing  a  rensonn!  ly  st»  nd,\  number  (though  net  ty:  <)  of 


u 

questions.  However,  in  spite  of  our  exhortation  to  eliminate  unhelpful  questions, 
many  persons  continued  to  ask  questions  throughout  the  experiment  which  (by  their 
own  claim)  were  never  helpful.  (Periodically,  we  asked  informants  why  they  kept 
asking  questions  which  they  rated  as  unhelpful.  The  typical  response  was  that 
they  were  developing  a  "feel  for  the  target,"  which  allowed  them  to  "exclude" 
many  choices  from  their  consideration,  thus,  making  the  task  of  choosing  more 
effici ent .  ) 

Of  course,  we  did  not.  actually  have  complete  life-histories  for  each  target. 
The  life  histories  were  incremented  as  the  experiment  progressed.  Whenever  an 
informant  requested  information  about  h  particular  target  which  was  not  already 
contained  in  that  target's  dossier,  either  the  informant  was  told  that  the  infor¬ 
mation  was  not  obtainable,  or  the  information  was  made  up  on  the  spot..  In  the 
latter  case  the  relevant  information  was  added  to  the  target's  dossier  for  poten¬ 
tial  use  by  later  informants.  This  resulted  in  some  inconsistent  target  character¬ 
istics.  (For  example,  one  targe*  -’ound  up  with  a  father  who  had  two  distinct 
birthplaces.)  11*  a  piece  of  information  was  added  to  a  target  on  the  tenth  in¬ 
formant,  this  meant  that,  the  previous  nine  informants  had  not  requested  the 
information.  An  example  of  a  target's  dossier,  after  bO  informants,  is  shown  in 
Table  3. 


Pretesting  revealed  many  questions  which  informants  might  ask.  This  was 
subject  to  some  necessary  concatenations.  One  informant  asked  if  one  target 
played  the  guitar.  We  interpreted  this  as  a  request  for  information  about  the 
target's  hobbies,  and  we  told  the  informant  so.  Some  informants  asked  questions 
at  the  beginning  of  the  interview  which  suggested  that  they  did  not  understand 
their  task.  Such  questions  as  "What  color  is  the  target's  hair"  or  "What  kind 
of  car  does  the  target  drive"  were  handled  by  giving  the  informant  n  answer  and 
lettin0  him  judge  the  information's  usefulness.  In  most.  cases,  ini  ants  needed 
further  explanations  and  stopped  asking  such  questions.  Subsequently,  the  ques¬ 
tions  were  not  recorded.  If  an  informant  insisted  that  a  piece  of  information 
was  useful,  however,  it  was  recorded  as  a  question.  An  example  of  this  was  an 
informant  who  asked  the  exact,  birth  date  of  a  target.  When  presented  with  u 
birth  date  she  proceeded  to  make  a  choice  on  the  basis  of  astrological  signs. 

This  question  was  then  recorded  as  "used." 


Kach  question  was  assigned  a  unique  identifying  number,  with  no  connotation 
as  to  order.  As  new  questions  wore  asked  in  the  actual  experiment,  each  vus  as¬ 
signed  a  number.  Table  k  presents  a  list  of  all  questions  asked  during  pretest 
and  tost  phases.  For  later  reference,  note  that  question  3  refers  to  target's 
occupation,  and  question  l^i  to  target's  location. 


For  each  target  the  procedure  van  as  follows:  as  each  question  was 
its  code  number  was  recorded,  preserving  the  sequence  in  which  they  were 
When  informants  had  asked  enough  questions,  they  stated  their  choice,  do 
be  someone  who  would  act  as  an  intermediary  in  the  small-world  process, 
they  provided  a  "few  sentences"  which  explained  why  they  had  selected  a 


Inr  intermediary  (i.c.,  "because  he's  «  real  estate  agent,' 


or 


"because 


friend's  futher  is  a  pharmacist,,"  or  "because  she  was  n  graduate  student 
because  he  (target.)  would  have  been  there  two  years  ago  when  she  (choice 
Next,  informants  ranked  tlu:  questions  they  asked  by  the  degree  to  which 
had  helped  them  make  their  choice.  FaCh  informant  was  required  to  sole 
first-ranked  question  for  each  target,  and  was  given  the  opportunity  to 


asked , 
usked. 
fined  to 
Then 
parti cu¬ 
bic.  girl 
ut  —  and 
)  left.*) 
the  answer 
et.  a 
rank 
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other  questions  (if  he  or  she  had  ashed  more  than  one)  second,  third,  fourth,  or 
fifth,  stopping  ns  they  felt  appropriate.  We  reminded  informants  of  all  ques¬ 
tions  they  had  asked  (but  had  not  ranked),  mid  inquired  whether  each  had  been 
"helpful"  or  "unhelpful."  Thus  each,  question  asked  for  each  target  vas  accom¬ 
panied  by  a  code  indicating  its  degree  of  usefulness.  The  relationship  (friend 
or  relative)  of  each  choice  to  the  informant  was  also  recorded. 

After  completing  the  test,  each  informant  answered  a  questionnaire.  This 
consisted  of  basic  socioeconomic  data,  and  a  personal  response  to  any  question 
ever  asked  by  the  informant  about  any  target.  For  example,  if  the  informant  ever 
asked  where  a  target's  spouse  went  to  school,  then  (unless  the  informant  were 
single)  he  or  she  provided  equivalent  information  about  his  or  her  own  spouse. 

III.  Coding 

The  experiment  yielded  four  essentially  different  sets  of  data:  1)  informa¬ 
tion  (which  ve  had  created)  about  50  targets;  2)  information  about  informants; 

3)  information  concerning  the  questions  asked  by  informants;  and  U )  information 
about  the  informants'  choices  and  those  choices'  relationships  to  the  targets. 

The  target  data  were  coded  first,  since  they  contain  more  information  than 
the  equivalent  informant  information.  The  informant  data  contain  less  informa¬ 
tion  because  informants  were  not  asked  to  provide  data  aboxit  themselves  on  ques¬ 
tions  they  never  asked.  By  the  end  of  the  experiment,  the  known  answers  to  each 
of  the  98  questions  ever  asked  atjout  any  target  (Table  h)  were  coded  in  a  format 
which  left  room  for  every  possible  answer.  Informant  data  were  then  coded  using 
the  same  format  as  for  targets. 

As  noted  above,  questions  were  coded  in  the  sequence  asked  by  informants, 
followed  by  a  ranking  of  each  question's  usefulness,  which  vas  also  provided  by 
the  informant. 


Finally,  we  developed  a  scheme  to  code  the  information  about  choices'  rela¬ 
tions  to  targets.  This  information  vas  contained  in  the  short  (usually  one  or 
two  sentences)  explanations  given  by  informants  on  why  they  made  a  particular 
choice.  Four  concepts  were  introduced,  die  "direct  hit,"  the  "associated  hit," 
the  "via,"  and  "the  intervening  choice."  If  an  explanation  revealed  that  a  char- 
acteristic  of  a  choice  matched  exactly  to  a  characteristic  of  the  relevant  target, 
this  was  a  direct  hit.  For  example,  if  a  target  lives  in  Los  Angeles  and  the 

choice  for  that  target  also  lives  in  Los  Angeles,  then  this  counts  as  a  direct 

hit.  If,  on  the  oth  "  hand,  a  target  lives  in  Los  Angeles  and  the  choice  lives 
in  Can  Francisco,  tl  n  if,  and  only  if,  the  informant  said  he  selected  the  choice 
on  the  basis  of  location,  this  counts  as  an  "associated  hit."  Associated  hits 
can  occur  for  a  wide  variety  of  reasons.  If  an  informant  says  lie  chose  a  phar¬ 
macist  in  order  to  g  t  to  a  physician  because  "they  are  botli  in  the  medical 
field,"  then  this  is  an  associated  hit.  Similarly,  a  farmer  and  n  tractor  tales¬ 
man  may  be  associated  by  occupation;  a  student  choice  may  be  associated  with  a 
college  administrator ;  a  choice  who  plays  Jnr.2  trumpet  as  a  hobby  may  be  associ¬ 
ated  with  a  target  who  collects  ,Jr.r.r.  records,  and  so  on.  The  concept  of  "asso¬ 

ciated  location"  and  "associated  occupation"  wore  introduced  in  our  earlier  mode! 
of  the  small-world  decision  process  (Kiliv.  rth  and  Bernard ,  1979)*  Cur  expcrior.ct 
in  this  experiment  Ins  broadened  the  concept  to  include  associations  such  ns  hob¬ 
bies,  organizations,  religions,  etc. 
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In  fact,  our  experience  with  these  data  has  shown  that  simple  associations 
are  not  enough  to  describe  all  the  relationships  which  informants  claim  exist 
between  their  choices  and  the  targets.  This  led  to  the  "associated  via"  and 
"intervening  choice"  categories.  Consider  the  case  of  a  choice  who  is  u  coal 
miner  linked,  by  an  informant,  to  a  target  who  lives  in  Kentucky.  The  coal 
miner  choice  may,  in  fact,  live  in  Ohio.  But.  if  the  informant  says  "l  chose  him 
because  he  is  a  coal  miner  and  he  could  contact  people  in  Kentucky  where  there 
are  lots  of  coal  miners,"  then  we  believe  this  is  best  described  as  "associated 
with  target's  location  via  choice's  occupation."  Some  other  examples  include  the 
following:  "I  chose  her  because  she  belongs  to  the  Sierra  Club  and  the  target 
works  for  the  Environmental  Protection  Agency,"  then  this  counts  as  "associated 
with  target's  occupation  via  choice's  organisational  affiliation."  "I  chose  him 
because  he  does  cross  country  skiing  and  the  target  lives  in  Vermont"  is  coded  as 
"associated  with  target's  location  via  choice's  hobby."  "I  chose  him  because  he 
collects  rocks  and  the  target  is  a  geology  student"  is  coded  as  "associated  with 
target's  field  of  study  via  choice's  hobby." 

Finally,  many  of  our  informants  were  apparently  thinking  two  steps  into  the 
small-world  problem  when  they  said  such  things  as  "l  chose  him  because  his  girl¬ 
friend  worked  at  Kroger's  grocery  and  the  target  owns  a  grocery  store."  This 
counts  ns  "associated  with  target's  occupation  via  intervening  choice's  occupa¬ 
tion."  The  choice  was  not  associated  with  the  target  by  any  characteristics  of 
his  own;  but  his  girlfriend  (whom  the  informant  may  not  have  known  well  enough 
to  name  as  his  choice)  is  associated  with  the  target's  occupation.  For  simplicity, 
we  code  the  fact  that  the  girlfriend  is  an  intermediary  choice,  and  that  she  is 
somehow  associated  with  the  target's  occupation.  Another  example  is  the  following: 
"l  chose  her  because  her  father  used  to  be  a  professional  pool  hustler.  He  could 
contact  the  target  who  likes  to  play  pool."  This  was  coded  as  "associated  with 
target's  hobby  via  intervening  choice's  occupation." 

IV .  Questions 


As  Table  U  shows,  a  total  of  8?  different  questions  were  asked  by  informants. 
(This  does  not  include  six  questions  which  were  asked  only  once,  each  by  one  in¬ 
formant.)  Obviously,  some  questions  were  asked  more  frequently  than  others. 

Figure  2  shows  the  probability  that  vhe  most  frequent  questions  are  ever  asked. 
Note  the  dominance  of  questions  3  and  l't ,  occupation  and  location  respectively, 
asked  92*  <ir.d  °0,«  of  nl  1  occasions.  Other  questions  were  asked  much  less  fre¬ 
quently.  The  most  commonly  asked  of  these,  are  questions  ?  (age  of  target,  asked 
1*2;'.’  of  the  time),  29  (sex  of  target,  asked  3c S  of  the  time),  30  (marital  status, 
2l*!v),  and  21  (hobbies,  21J5).  These  probabilities  can  also  e  interpreted  as  a 
fractional  contribution  to  the  total  number  of  questions  ever  asked,  with  3  and 
lU  at  each,  contributing  38£  of  all  questions  ever  asked. 


The  final  curve  it*  Fig.  2,  shews  the 
Four  questions  (3-occupation,  ll*-loeation , 
5*0"  of  all  questions  ever  asked.  Ton  quer 
questions  every  asked;  eighteen  questions 
account  for  9 ' > > . 


cumulative  effects  of  the  oont ribut ions 
2-age,  29-sex)  account  for  more  than 
tions  account  for  more  than  '  of  all 
account  for  oo.Y;  twenty-five  questions 


on 

at 


Figurt  *  v\-3c  show  s 

ces  who. *  q  .cat  ions  were 
nil  helpful,  or  unhelp 


i-ila.r  cvrve  - ,  restricting  attention  respectively  to 
declared  t inihr;  to  have  boon  the  meal  helpful, 

f.il  lo  them  in  making  a  choice.  Figure  3a  shows  that 
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two  questions  (occupation  and  location)  account  for  6J \%  of  all  "most  helpful" 
(i.e.,  top  ranked)  questions.  When  questions  21  and  2 6  are  added,  over  are 
accounted  for.  Question  1 6  accounts  for  another  5*;  question  4  accounts  for  4* ; 
and  then  the  curve  drops  off. 

The  picture  is  changed  subtly  when  we  consider  questions  which  are  graded 
as  at  all  helpful  (not  necessarily  first-ranked)  by  informants  (Figure  3b). 

Again,  location  and  occupation  dominate.  However,  eight  questions  are  required 
in  order  to  account,  for  "all  helpful  questions."  It  is  perhaps  surprising  that 
the  distribution  of  questions  graded  as  "unhelpful"  is  largely  the  same  as  for 
those  graded  ns  "helpful"  (Figure  3c).  This  suggests  that  people  tend  to  ask  the 
same  questions  about  all  targets. 

The  number  of  questions  asked  by  informants  varied  greatly,  as  shown  in 
Fig.  1*.  The  mean  number  of  questions  in  a  string  was  4.8,  s.d.  2.7;  but.  note 
that  one  informant  once  asked  a  string  of  21  questions  before  making  a  choice. 

The  mean  number  of  questions  asked  by  informants  differs  significantly**  between 
informants,  from  1.4  to  9.6.  (Henceforth,  single  asterisks  denote  signifi¬ 
cance  levels;  double  asterisks  denote  1)5  or  better.)  Similarly,  some  targets 
required  significantly**  more  questions  than  others,  from  a  minimum  (average)  of 
3.4  for  a  target  in  Youngstown,  Ohio,  to  a  maximum  of  for  a  target  in  Morgan¬ 
town  . 


The  length  of  a  given  question  string,  of  course,  depends  on  the  difficulty 
the  informant  had  in  making  a  choice.  In  fact,  there  is  strong  evidence  that  if 
neither  location  nor  occupation  are  very  useful  in  a  given  case,  the  informants 
ask  many  more  questions  in  an  attempt  to  find  some  basis  for  making  a  choice. 
Specifically,  question  strings  in  which  questions  3  or  14  were  ranked  first  or 
second  most  useful,  on  average,  are  about  one  question  shorter  than  strings 
where  this  was  not  the  case.  These  differences  are  significant**  for  each  of 
the  six  combinations  we  examined:  location  was  most  useful,  or  it  was  not;  occu¬ 
pation  was  second  most  useful,  or  it  was  not;  etc.  In  many  cases  there  were  also 
significant  differences  between  informants  and/or  targets,  but  this  does  not.  af¬ 
fect  the  conclusions. 


When  informants  had  difficulty  making  a  choice,  they  tended  (not  surpris¬ 
ingly)  to  stop  when  they  readied  the  most  useful  question.  Question  strings 


six  or  more  questions;  end  with  the  most  useful  question  significantly’ 
often  than  would  be  expected  by  chance.  For  example,  of  the  15  strings  contain 
ing  13  questions,  five  terminated  with  the  most-  usefu.l  question. 


There  is  evidence  that  informants  become  set  in  their  habits  of  asking  qu  e 
tions,  even  when  their  own  results  suggest  they  should  change.  (This  is  an  ex¬ 
perimental  phenomenon  which  obviously  biases  our  results.  Hence  the  random! ing 
of  the  order  in  which  1  argots  wore  presented.)  The  moan  number  of 


nit 


questions  generated  by  an  informant,  over  ell  SO  targets,  was  10.7,  s.d.  3 . o , 
although  one  informant  asked  only  four  di  f  iVr-  r  t  questions,  and  one  a  kod  kg. 
Pur  ing  interview:  several  informants,  were  bothered  by  their  inability  to  think 
of  questions  to  ask.  1: is  led  to  the  development  of  a  basic  act  of  question. 


which  these  int'ort/mt  s  used  over  the  SO  tar. 


They  often  felt  no  reason  to 
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change  their  set  of  questions  for  a  different  target,  as  a  new  set  of  proba¬ 
bilities  for  a  match  was  presented  each  time.  (Note  that  the  mean  number  of 
different  questions  generated  by  all  informants  per  target  was  28.3,  s.d.  3.1. 

This  mxich  narrower  variation  per  target  than  per  informant  suggests  that  the 
total  amount  of  information  needed  by  aiy  informant  for  any  target  is  remarkably 
uniform. ) 

Similarly,  the  correlation  between  the  "sequence"  (see  above)  when  a  given 
question  was  first  asked  by  an  informant,  and  the  percentage  of  time  it  was 
asked  thereafter,  is  significantly**  negative.  In  other  words,  questions  asked 
about  the  first  few  targets  tend  to  be  used  for  most  targets;  questions  which 
were  asked  for  the  first  time  for,  say,  the  thirtieth  target  are  not  frequently 
used  after  that.  Thio  tendency  remains  even  for  questions  which  are  perceived 
as  useful  the  first  time  they  are  asked. 

As  expected,  there  is  a  slight,  but  significant  learning  effect.  Let  the 
number  of  times  a  given  question  is  asked  by  an  informant  be  n,  and  the  number 
of  times  it  is  deemed  at  all  useful  be  m.  The  correlation  between  the  fractional 
time  it  was  useful,  m/n,  and  n  is  O.li***.  Hence  there  is  a  weak  tendency  to  use 
questions  more  frequently  when  they  are  perceived  as  helpful. 

V.  Sequence  of  Questions 

The  position  of  a  given  question  in  a  string  depended  heavily  on  the  par¬ 
ticular  question.  Figure  5  shows  the  probabilities  that  certain  questions  occur 
first,  second,  third  or  fourth  in  a  string.  Location  (1^)  is  highly  likely  to 
be  asked  first  or  second,  but  much  less  likely  thereafter.  Occupation  (3)  is 
almost  equally  likely  to  be  asked  first  or  second  or  third,  but  hardly  ever  after 
that.  Questions  such  as  target  hobbies  (21)  or  organizations  (26)  are  almost 
never  asked  first,  or  second,  but  occur  frequently  further  down  the  string. 

It  is  straightforward  to  define  the  "most  likely"  question  string.  Suppose 
we  consider  only  strings  beginning  with  question  lh.  This  may  be  useful  or  non¬ 
useful  for  the  informant.  In  each  case  he  will  select  a  "most  likely"  second 
question;  whether  this  is  useful  or  not  determines  his  third  question,  and  so  on. 
Figure  6a  shows  the  sequences,  and  associated  probabilities,  for  such  strings. 
Sequences  beginning  with  question  3  are  almost  identical,  but  with  1^  and  3  inter¬ 
changed. 

A  clear  pattern  emerges,  with  questions  on  hobbies  (21 )  or  organizations 
(2o),  etc.,  being  asked  when  location  or  occupation  are  unhelpful.  These  strings 
are  usually  quite  short,  as  indicated  both  in  Figure  6a  and  in  the  previous  sec¬ 
tion.  Figure  6b  shows  a  similar  sequence  for  strings  beginning  with  question  29 
(sex).  After  finding  out  the  target's  age,  the  informant  normally  proceeds  to 
occupation  and  location,  before  moving  into  similar  sequences  to  Figure  6a. 
Finally,  sequences  beginning  with  question  1  immediately  ask  location  and  occupa¬ 
tion,  and  again  continue  as  in  Figure  6a. 


There  is,  of  course,  a  strong  causal  link  between  certain  questions  and  those 
immediately  following:  ".-.'here  does  the  target  travel?"  is  almost  always  preceded 
by  "Does  the  target  travel?"  and  can  only  be  asked  if  the  latter  question  received 
an  affirmative  answer.  The  causal  links  cam  best  be  presented  on  a  branching 
diagram  (Fig.  7).  We  define  question  J  to  be  causally  related  to  question  i  if 
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two  conditions  are  met:  (a)  the  probability  that  i  immediately  precedes  j  in  a 
string,  given  that  J  was  asked,  be  high;  (b)  the  probability  that  j  not  be  asked, 
given  i  war.  not  asked,  be  high.  Tne  level  of  causality  is  then  the  product  of 
these  two  probabilities;  unity  therefore  implies  certainty  in  cases  (a)  and  (b). 
Hgure  7  traces  each  question  back  down  the  causal  chain  to  its  most  likely  pre¬ 
decessor  (in  cases  of  ties,  tk  lowest  numbered  question  is  used  for  clarity). 
Thus  question  60,  if  it  occurs,  always  follows  question  99,  which,  if  it  occurs, 
follows  39  99?  or  more  of  the  time,  and  so  on.  Note  that  36  also ,  if  it  occurs, 
follows  39  99?  or  more  of  the  time;  hence  36  and  99  are  almost  never  asked 
together.  The  connections  between  2,  29,  30,  3,  1^,  and  1  are  all  negligible, 
but  are  included  for  completeness. 

Examination  of  tne  data  shows  that  the  weakness  of  some  of  the  causal  links 
is  caused  by  informants  shuffling  the  orders  of  some  of  their  questions.  This 
may  be  due  partially  to  informants'  attempts  at  breaking  the  monotonous  routine 
of  the  experiment.  Some  felt  that  part  of  the  "game"  was  to  skip  needless  ques¬ 
tions  ("why  ask  if  a  target  has  children  when  I  can  just  ask  the  ages  of  children 
and  get  more  information?").  Some  informants  even  intentionally  varied  the  order 
of  their  questions  in  order  to  avoid  "getting  in  a  rut."  To  allow  for  this,  the 
requirement  that  i  immediately  precede  j  was  relaxed;  instead,  i  need  merely  pre¬ 
cede  j,  at  some  position  in  a  string.  Fifty-one  questions  were  found  to  be  100? 
related  to  1 ,  2,  3,  o^t ,  or  (in  one  case)  30,  and  usually  to  several.3  Sixteen 
more  were  related  to  3  or  1^  at  the  90?  level  or  above.  Of  course,  this  reflects 
the  strong  probability  of  3  and  lb  being  asked  near  the  beginning  of  each  string. 

If  we  remove  any  reference  to  the  order  of  the  questions,  and  concentrate 
instead  on  the  ’'packaging"  by  informants  of  entire  question  strings,  some  sys¬ 
tematic  patterns  emerge.  For  example, the  2,900  (90  x  50 )  question  strings  were 
treated  as  lists  of  99  integers.  The  jth  integer  in  a  string  was  one  or  zero, 
depending  on  whether  question  j  was  asked  in  that  string  or  not.  Factoring 
these  strings  produced  the  questions  which  tend  to  occur  together.  Nineteen 
factors  were  found;  several  of  these  had  only  one  question  with  a  high  (varimax) 
loading  on  that  factor.  Define  a  group  to  be  those  questions  with  a  factor  load¬ 
ing  of  0.2  or  more  on  one  factor  (the  "typical"  loading  is  0.02).  The  groups 
found  were:  21,  30,  39,  ^1,  b2 ,  93,  5^  (children);  2,  21,  29,  30  (socioeconomic 
status);  37,  38  (travel);  39,  36  (family);  6,  63  (spouse);  23,  2b,  b9  (schooling) 
8,  22,  ^0  (more  socioeconomic  questions);  1*9 ,  99  (social  life);  7,  17  (previous 
location,  occupation);  21  ,  26  (spare  time);  96,  6^,  11,  92  (precise  details); 

1,  b  (more  precise  details);  10,  7b  (more  precise  details). 

It  is  interesting  to  note  that  3  and  lU  do  not  occur  in  these  groups,  al¬ 
though  lU  has  a  high  loading  on  factor  2  but  with  the  opposite  sign  (i.c.,  when 
lit  is  asked,  group  2  tends  not  to  be,  and  vice  versa).  A  similar  factoring  but 
on  stri'  with  -1  (question  asked  but  not  useful),  0  (question  not  asked),  and 

+1  (qucl  on  asked  and  useful)  yielded  very  similar  groups. 


Finally,  we  chose  to  examine  the  pattern  of  co-occurrence  of  questions  by  a 
slightly  modified  form  of  cluster  analysis,  operating  on  groups  of  questions. 
Initially  each  question  was  allocated  a  unique  group.  The  data  to  be  clustered 
were  the  probability  that  a  question  in  group  j  would  eo-occur  with  one  frer  grot 


i,  given  tl.-.t  at  lea.  one  from  group  i  vs 
were  merged  if  the  relevant  probability  vs 


a.  -a  a  in 


n  string.1*  Or 


•OUt'S  i 


j 


a love  a  cutoff  level  (usually  ldd\. 
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but  reduced  by  5e  an  many  tirr.es  us  necessary  to  ensure  further  merging).  After 
each  set  of  merging,  the  probabilities  were  re-computed;  the  process  halts  when 
all  questions  are  in  one  large  group.  Unfortunately ,  one  large  group,  contain¬ 
ing  3,  1** ,  2,  29 ,  etc.  was  created  early  in  the  procedure;  because  at  least  one 
of  these  questions  was  always  asked  in  a  string,  this  group  always  co-occurs 
with  other  questions,  so  that  this  group  then  swallowed  up  all  the  other  ques¬ 
tions. 


To  modify  this,  it  was  decided  not  to  allow  merging  with  any  group  whose 
(weighted  average)  probability  of  co-occurrence  with  any  other  group  exceeded 
an  arbitrary  figure  of  50£.  This  yielded  the  nine  question  groups  shown  in 
Table  5.  Apart  from  the  rare  questions  in  groups  3  to  5,  the  clustering  seems 
to  have  yielded  meaningful  groups  of  questions;  it  is  particularly  enlightening 
that  virtually  all  questions,  other  than  basic  socioeconomic  ones,  are  together 
in  one  group. 

The  strong  patterning  of  questions  which  we  have  seen  throughout  this  sec¬ 
tion  confirms  the  strategy  we  adopted  in  R.1W  of  supplying  location,  occupation, 
and  sex  of  target  to  our  informants.  We  also  found  in  RSW  that  the  race  and/or 
ethnicity  of  targets  was  unimportant  to  informants,  and  this  result  is  confirmed 
here.  Race  was  only  asked  1%  of  the  time  in  the  present  experiment,  and  eth¬ 
nicity  was  asked  only  %/S  of  the  time.  Eased  on  data  from  this  informant-defined 
experiment,  future  RSW-like  experiments  sh. aid  include  the  age,  organizational 
memberships ,  hobbies,  and  marital  status  (including  children)  of  targets. 

VI .  Accounting  for  Questions 


The  previous  sections  described  the  questions  asked  by  informants.  We  now 
stv.k  to  explain  the  variation  in  questions  by  referring  to  differences  between 
both  informants  and  targets.  At  the  simplest  level,  there  are  2,500  question 
strings,  each  asked  by  an  informant  (with  known  background  data)  about  a  target 
(also  with  background  data).  It  is  logical,  therefore,  to  attempt  to  account 
both  for  the  number  of  questions  asked,  and  for  whether  a  gi^en  question  was 
asked,  on  the  basis  of  individual  strings. 


Multiple  regression  of  the  number  of  questions  in  a  string  with  both  in¬ 
formant  and  target  data  accounted  for  only  l6£  of  the  variance.  Although  this 
amount  is  significant** ,  many  correlations  later  in  the  paper  account  for  far 
larger  variances.  To  save  space,  we  henceforth  choose  to  ignore  any  repression 
accounting  for  less  than  k0&  of  the  variance.5  Similarly,  the  number  of  differ¬ 
ent  questions  per  informant  or  pci'  target  was  not  well  predicted  by  personal 
characteristics  (although  target  with  children,  not  surprisingly,  had  signif¬ 
icantly*  more  different,  questions  asked  about  them).  Discriminant  ana'  ysos  were 
conducted  cn  frequently  asked  questions,  in  an  attempt,  to  predict-  for  which  in¬ 
formants  and  targets  any  given  question  would  be  asked.  Target  data  are  of 
little  use  in  this  matter:  only  the  background  of  the  informants  have  much 
bearing  on  whether  they  ask  n  question.  For  example,  question  16  (where's  the 
target's  location  near)  is  asked  more*  by  informants  who  have  lived  in  v! sees 
other  than  Morgantown  than  by  those vno  have  always  lived  in  Morgantown  (V.3 
to  2.8  respectively).  Additionally,  most  questions  are  hardly  ever  asked,  and 
even  after  discriminant  analysis,  the  optimal  prediction  r< rains  that  these 
questions  are  never  asked. 
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However,  for  three  questions  it  was  possible  to  improve  the  quality  of 
this  simple  prediction.  For  questions  ?  (age),  lU  (location),  and  29  (sex), 
the  discriminant  function  correctly  predicts  whether  the  question  was  asked 
67# ,  93#  and  73#  of  the  time  respectively.  These  are  to  be  compared  with  the 
(null  hypothesis)  prediction  of  58#  not  asked,  90#  asked,  and  6b%  not  asked. 

In  all  three  cases  the  structure  of  the  discriminant  function  is  similars  the 
higher  the  informant's  occupation  and  education  levels,  the  more  organizations 
they  belong  to  and  the  more  hobbies  they  have,  then  the  more  likely  age  and 
sex  are  to  be  asked,  and  the  less  likely  is  location  to  be  asked. 

Restricting  attention  to  questions  which  were  designated  "most  useful" 
destroys  all  predictive  capability  except  for  question  ill.  Since  the  "most 
useful"  designation  depends  on  the  type  of  target,  both  informant  and  target 
date  affect  whether  l*i  is  most  useful  or  not.  This  can  now  be  predicted  72# 
of  the  time.  The  higher  the  informants'  age,  the  lower  their  education  level 
and  the  number  of  organizations  they  belong  to;  the  higher  the  target's  occupja- 
tion  level  and  education,  and  the  larger  the  target's  town,  the  more  likely  is 
question  1*1  to  be  most  useful.  (This  is  one  of  the  few  inconsistencies  with 
RSW;  a  higher  target  occupation  level  should  he  paired  with  a  higher  probability 
of  occupation  being  the  most  useful  question.  However,  the  prediction  from  RSW 
is  confirmed  if  we  examine  the  discriminant  function  for  question  3,  where  the 
higher  the  target's  occupation  level,  the  more  likely  is  3  to  be  the  most  useful 
question. )  Similar  results  are  found  for  questions  which  were  designated  to  be 
at  all  useful,  save  only  that  target's  occupation  now  has  no  significant  effect 
upon  whether  location  is  useful. 

Having  found  that  little  variation  in  questions  could  be  accounted  for  on 
a  string-by-string  basis,  we  analyzed  the  question  strings  first  averaged  over 
all  targets  (i.e. ,  retaining  only  informant  data  with  which  to  explain  the  vari¬ 
ation)  and  then  averaged  over  all  informants  (retaining  target  data). 


There  are  a  great  many  questions  which  could  be  asked  of  either  of  these 
data  sets.  In  searching  for  signals  in  the  data  we  chose  to  examine  whether 
differences  in  dichotomous  variables  produced  significant  differences  in  ques¬ 
tion  usage.  For  example,  do  male  informants  ask  a  specific  question  more  than 
females?  Then  we  attempted  to  account  for  differences  in  question  usage  by 
regressing  question  usage  against  characteristics  of  informants  or  targets. 

Table  6  presents  all  13  examples  of  significant**  differences  in  question 
usage  between  informants,  split  .into  two  subgroups  by  various  criteria.  There 
were  13  additional  significant*  findings  which  are  not  reproduced.  The  reason 
for  their  suppression  is  that,  of  the  700  tests  we  carried  out,  we  would  expect 
35  to  be  significant  by  chance  alone  at  the  5#  level.  Hovever,  only  7  signifi¬ 
cant  comparisons  at  the  1#  level  would  be  expected  by  chance;  the  13  findings 
in  Table  6  should  not,  therefore,  be  a  chance  occurrence.  (in  fact  ,  the  total 
of  13  is  itself  significant  at  the  5#  level.) 


The  most  interesting  findings  are  the  persistent  use 
("where  does  the  target,  travel?")  by  informants  who  report 
travel.  Frcrumably  those  who  do  travel  are  more  likely  to 
with  the  tar-yt's  location,  and  therefore  have  no  nerd  to 
The  other  finding,  not-  in  Table  6,  i s  that  males  ask  1  oca. ‘  i 
and  find  location  to  be  most  useful,  when  asked  f i rsi  in  a 


of  question  3S 
that,  they  do  not 
have  connections 
enquire  further, 
on  less*  than  fem-hk 
string  of  question; 
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more*  than  females.  Combined  with  the  fact  that  females  find  location  to  be 
most  useful  when  asked  last  move**  than  males,  there  is  u  clear  difference 
in  usage  of  location  between  the  sexes.  We  have  no  idea  why  this  consistent 
difference  emerges  from  the  data. 

Multiple  correlation  of  question  usage  by  informant  characteristics 
yielded  very  little  additional  information.  No  informant  characteristics 
accounted  for:  the  mean  number  of  questions  asked;  the  number  of  different 
questions  asked;  the  probability  of  a  given  question  being  asked; G  the  prob¬ 
ability  of  a  given  question  being  most  useful.;  or  the  probability  of  a  given 
question  being  not  useful.  Three  multiple  correlations  did  produce  acceptable 
results  (i.e..  more  than  h0%  of  the  variance  accounted  for,  over  50  cases; 
nearly  all  the-  125  multiple  correlations  were  statistically  significant).  For 
example,  the  probability  that  marital  status  was  first  or  second  most  useful 
increases  as  occupation  level,  number  of  hobbies  and  organizations,  education, 
and  degree  of  activity  in  religion  of  the  informant  increases.  This  makes 
intuitive  sense:  the  more  connections  an  informant  lias  with  the  rest  of  the 
world  (as  indicated  by  organizational  affiliations,  for  example)  the  easier 
it  is  to  "connect"  with  a  target;  but  if  this  fails,  then  it  may  still  be  pos¬ 
sible  to  "connect"  with  the  target's  spouse,  especially  if  the  target  is  a 
housewife. 


Next,  the  probability  of  question  lit  being  most  useful  when  asked  first  in 
a  string  is  higher  for  male  informants  and  for  those  who  are  active  in  religion, 
and  those  who  have  children;  it  also  increases  with  age,  income,  and  decreases 
with,  number  of  organizations  and  hobbies,  education  and  occupation  level  of  the 
informant.  It  would  seem  that  informants  with  few  links  to  the  world  may  ask 
other  questions  after  location,  l'ut  there  is  less  chance  that  these  questions 
will  be  useful.  Finally,  the  probability  that  question  7  (religion)  be  most 
useful  when  asked  last  decreased  with  informants'  occupation  level,  income, 
education,  and  number  of  organizations;  it  increases  with  age;  and  it  is  lower 
when  the  informant  is  active  in  religion  or  has  children. 


Target  characteristics  apparently  play  a  much  larger  role  in  accounting 
for  question  usage,  although  there  are  again  only  13  significant**  differences 
in  question  usage  for  different  targets,  as  split  into  two  subgroups  by  various 
criteria.  Table  7  shows  several  interesting  features.  The  sex  of  the  target 
is  asked  more  often**  for  female  targets  (do  informants  somehow  got  clues  to  a 
target's  sex  from  other  questions  and  then  seek  to  confirm  their  feeling?) 
Questions  6  and  6 3  (relating  to  spouse's  occupation)  are  asked  more  often** 
ubout  female  targets,  and  conversely  question  3  (actual  target's  occupation)  is 
asked  more  often**  about,  males.  The  split  of  targets  into  those  in  urban  and 
rural  areas  confirms  the  findings  in  RSW,  namely  the  strong  tendency  to  use  oc¬ 
cupation  as  a  reason  for  rural  targets  and  location  for  urban  targets. 


Table  G  demonstrates  that  target  characteristics  account  for  most  of  the 
frequently  asked  questions.  (In  fact,  multiple  r< gresrions  of  any  question 
usage  or  qu-.  et ion-related  topic  on  target  characteristics  almost  invariably 
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level .  Conversely,  location  tend;;  to  be  the  moot  useful  reason  when  targets 
live  in  urban  locations;  however,  high  occupation  level  and  distance  also  tend 
to  increase  the  probability  that  location  he  most  useful.  Note  also  that  quer 
tion  1  (name)  is  most  U'  ei*ul  ,  given  that  it  vns  asked,  only  for  targets  near 
Morgantown,  as  might  be  expected.  For  such  targets,  the  likelihood  of  the 
target's  n:uue  being  most  useful  increases  as  t ho  target's  .socioeconomic  status 
decreases.  (Name  was  used  in  one  of  three  ways:  primarily  ns  an  identifier 
for  the  next  person  in  the  (nonexistent]  chain;  secondly,  if  a  choice  had  the 
some  name  ns  the  target;  thirdly,  if  the  target's  name  implied  ethnicity  which 
could  be  used  as  a  criterion  for  making  a  choice.)  Questions  relating  to  hob¬ 
bies  (21)  and  organisations  (26)  also  yield  plausible  results:  the  more  hob¬ 
bies  or  organisations  a  target  has.,  the  more  likely  is  the  relevant  question 
to  be  useful;  the  likelihood  is  also  increased  for  targets  living,  further  from 
Morgantown.  However,  hobbies  arc  more  frequently  useful  for  male  targets,  and 
organisations  for  female  targets?,  although  the  latter  tendency  is  very  weak. 

In  summary,  characteristics  of  targets  control  most  of  the  question  which 
informants  ask;  und  characteristics  of  informants  do  not  appear  to  have  much 
influence  on  which  questions  arc  asked.  Of  course,  on  a  one  informant -one  tar 
get  basis,  this  is  untrue  (witness  the  discriminant  fund  \ona) ,  The  signals 
only  emerge  upon  averaging  over  all  informants  or  targets. 

VII.  Choi ccs 

On  uverage,  informants  used  l| 0 . 7  different  choices  for  the  50  targets 
(s.d.  )) . 9 ) .  This  number  is  significantly**  higher  than  the  3^.7  different 
choices  for  the  first  50  targets  in  HOW.  The  difference  is  of  course  due  to 
the  inclusion  of  ten  very  local  targets  in  the  current  experiment.  In  fact, 
on  average,  9.0  different  choices  (s.d.  0.9)  were  used  for  the  ten  local  tar¬ 
gets,  suggesting  that  each  informant,  has  a  large  number  of  choices  for  local 
targets,  as  expected  from,  intuition.  Only  two  of  tho  remaining  b 0  non-local 
targets,  on  average,  had  one  of  the 'local"  choices  used  for  them.  If  locality 
did  not  matter  to  informants,  this  low  number  would  occur  by  chance  less  than 
one  in  10°  times.  Hence  "local"  choices  are  only  used  for  local  targets. 

Informants  made  male  choices  67?  of  the  time  (which  is  significantly** 
higher  than  the  60?  found  in  KSW,  but  reflects  the  same  tendency  to  choose 
males).  Informants  made  choices  who  "knew  a  lot  of  people"  only  7?  of  the 
time  (s.d.  15c).  The  distribution  of  those  choices  across  turrets  is  signif¬ 
icantly**  less  scattered,  suggesting,  that  tho  decision  to  use  someone  who 
"knows  many  people"  is  a  function  solely  of  informants. ,  and  not  of  t  argets. 
Similarly,  the  number  of  intervening,  choices  used  per  target,  v.'irio.  signif¬ 
icantly  less,**  than  per  informant  ,  so  that  the  decision  to  use  .an  intervening 
choice  depends  only  on  the  informant. 
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often  for  male  target s  (3?l.fi)  than  for  female  turret:;  (r’7.°);  femnl«  informant 
used  relatives  us  their  choice  sign  i  ficunt  ly*  *  more  often  than  did  ;  .u]<  .  (ll.< 
to  Y  reoj'ectively ) .  In  HI'.W  wo  noted  that  females  used  family  more  than  n'.cr 
did,  but  we  hud  no  way  to  !  now  whether  this  would  still  hold  when  the  barret 
population  included  20?  local  targets  (it)  out  of  ‘>0). 


The  number  of  di  fferent.  choices  mnd  •  cat.  l>  fitted  wi  1 1  ( J» .1? )  by  a 
linear  combination  of  informant  characteristics.  The  number  increase:--  with 
informants'  occupation  and  education  levels,  number  of  organisations;  and  hob¬ 
bies,  income;  it.  decreases  with  aye, and  if  the  informant  is  active  in  rclii  ion 
mid/or  has  children.  The  number  of  male  choices  made  decreases  with  the  infer 
mant's  age,  occupation  and  education  levels,  and  number  of  organizations;  it 
is  higher  if  the  informant  is  male  or  has  children  (!u*~  of  variance  accounted 
for ) . 


The  number  of  male  choices  per  target  may  also  be  accounted  for  by  t  arget 
characteristics  ( ’>  1  7  of  variance).  Tlie  number  increases  wi  Ui  target's  occupa¬ 
tion  and  education  level:;,  number  of  hobbies  and  distance  from  Morgantown;  it 
also  increases  for  male  targets  with  children;  and  it  doer*,  uses  with  ago  of 
target.  Finally,  the  number  of  family  members  used  jar  target  increases  with 
target's  distance  from  Morgantown  andig.o;  decreases  with  occupation  and  educa¬ 
tion  levels  and  number  of  target's  organisations;  mid  increases  if  the  target 
lives  in  a  rural  area  or  is  female  (M?  of  variance). 


Given  this  brief  discussion  of  choices,  we  turn  now  to  the  reasons  they 
were  chosen. 


VIII.  Rnasonx  f or  Choi .  ■  ■«? 


It 

so  that 
been  a 
concept 
namely 


is  difficult  to  separate  reasons  totally  from  questions  or  choices, 
the  degree  of  usefulness  of  a  question,  for  example ,  has  frequently 
feuture  of  the  previous  sections.  However,  we  can  now  extend  the 
to  include  features  of  the  choices  discussed  in  t  he  coding  seel  ion: 
the  direct  hits,  associated  hits,  via:.,  and  intervening  choices. 


We  found  in  ROW  that,  for  any  target,  the  most  popular  reason  for  choice 
was  always  location  and  occuj -at ion  (but  only  "ethnicity"  and  "other"  wore 
the  other  possible  reasons).  The  current  data  permit  testing  of  this  find¬ 
ing.  Over  the  r-0  target  :• ,  .location  van  th  *  men  l  popular  rear-on  for  choice 
23  times,  ami  occupation  2?  linos.  Only  twice  wore  there  any  other  most  pop 
ular  reasons:  once  question  6,  once  question  1;. 


The  finding  is  repealed  if  wo  consider  tin  most  popular  rerun 
per  informant. .  Twenty-one  informants  used  occupation  most  often, 
used  leont  ion  most  oft en ,  and  throe  5*  formnrt  used  one  each  of  a; 
and  organization  most  often  Hence  the  dominant  role  of  location 
lion  a.;  overriding  reasons  dor  choice  is-  confirmed. 


'n  for  chv'i 
! wenly-s i x 
*c  ,  hobbies 
and  occupa 
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The  data  also  yen, lilted  an  independent  test 
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The  mean  number  of  direct  bit:,  (per  informant-target  combination)  war 
O.b,  s.d.  1.1.  Certain  questions  ur<  the  most  likely  to  be  those  which  are 
direct  hits.  These  are  the  basic  collection  which  recur  throughout  this 
paj  r:  3*1^  (0.9  timet  per  target  over  the  50  informants ) ;  2  (0.46);  21 
(0.3M;  26,"’9  (0.26);  and  30  (0.])i).  Virtually  nothing  in  the  informant  or 
target  data  accounts  for  any  variation  in  quantities  connected  with  direct 
hits . 


There  vac.  a  similar  number  of  associated  hits  (mean  0.95,  s.d.  0.62). 
Again,  occupation  and  location  dominate:  3  was  an  associated  hit  18.1  times 
and  a  via  13.9  tines  j  r  target;  lit  was  an  associated  hit  21.3  times,  a  via 
13.2  times.  All  other  questions  occurred  only  about  once  per  target  at  most. 
Although  many  of  these  can  be  accounted  for  by  regression  with  target  or  in¬ 
formant  characteris  tics ,  there  is  little  point  in  explaining  very  infrequently 
occurring  phenomena.  However,  question  3  occurs  as  an  associated  hit  signif¬ 
icantly**  more  for  male  targets  (21)  than  for  female  targets  (15).  Question 
lit  occurs  as  an  assoc.ia.ted  hit  significantly*  more  for  targets  with  children 
(22)  than  for  childless  targets  (15);  and  us  a  via  significantly  less*  for 
targets  who  live  in  an  urban  area  (11 )  than  for  those  in  a  rural  area  (l'i). 
These  three  findings  t  ud  to  agree  with  loth  vrith  other  findings  in  this  paper 
and  those  in  RSW. 

Finally,  there  were  only  0.1  (s.d.  0 .  It )  intervening  reasons,  on  average; 
again  questions  3  and  ll  dominate  the  usage  in  this  category  (10.3  intervening 
hits  per  target  for  question  3',  19 >9  for  lb ,  h  for  16  and  under  1  for  other 
questions;  and  only  3,1*)  ever  occur  more  than  once  per  target  as  an  interven¬ 
ing  via). 


IX.  The  "Teg"  Conc-t'pt  and  xts  Pro  jn  Predicting  Choicer, 

If  we  arc  to  understand  how  an  individual  makes  choice  when  presented 
with  limited  information  about  a  target,  we  need  to  model  the  decision  process 
in  some  way  that  allows  testing.  The  model  we  present  here  is  very  simple, 
and  surprisingly  successful  in  predicting  the  choices  made  by  each  informant. 


We  shall  assume  that 
cause,  in  some  sense,  the 
the  target.  Furthermore, 
are  similar  to  a  target, 


an  informant  selects  a  choice  for  a  given  target  K.— 
informant  perceives-  the  choice  to  be  "similar"  to 
we  assume  that  if  there  are  several  choices  which 
the  informant  chooser,  the  most  similar  such,  choice 


(however  this  may  be  evaluated  by  the  informant).  In  other  words,  the  disso¬ 
nance  between  the  choici  and  what.  is.  known  about  the  target  is  minimised. 


How  is  similarity  between  choice  and  target  to  be  measured?  CV  e.rly, 
the  actual  decisions  involve  highly  complex  cognitive  processes  about  which, 
we  nndcrstnrid  .little.  As  a  simplification,  l .  -re  fore,  wo  assumed  that  a 
choice  and  a  target  are  perceived  as.  similar  if  and  when  some  facet  of  the 
choice  (e.g.  ,  where  th  choice  went  to  school)  and  sc::;  •  facet  of  the  target 
(e.g. ,  where  > a  of  the  target's  children  attends  school)  arc  either  connected 
or,  at  best.,  identical.  In  none  cases,  of  course,  vo  had  to  suggest  this  con¬ 
cept  to  informants ;  this  inevitably  mis l  weaken  the  following  case  slightly. 


Wo  shall  term  each  facet  of  a  target';;  :  in  .  sal  history  a  "tag."  Al¬ 
though  twy.  tr,  !  eg;in  the  experiment  with  very  few  tags  ( .  eo  tVotien  il ),  as 
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the  experiment  progressed  and  more  data,  were  invented  for  each  target,  the 
number  of  tags  gre w.  Of  course,  the  nature  of  the  tags  differed  widely,  as 
did  their  coding  within  the  data.  Tn  order  to  count  and  catalogue  the  target 
tags,  we  again  simplified  the  problem.  V.V  treated  all  lags  coded  in  location 
style  as  locution  tags  (i.t*.,  target's  location,  previous  location,  family 
location,  whore  the  target  travel:.,  etc.  are  all  location  tags).  Overlaps 
were  removed  (so  that,  if  a  target  still  lives  where  he  was  born,  only  one  tag 
is  created).  Similar  tags  were  counted  for  occupations,  hobbies,  organisations 
age,  sex,  and  religion  (the  latter  three,  for  targets,  consisted  of  one  Lag 
apiece,  of  course). 

Targets  develop  many  tags.  Figure  8  shows  the  probability  of  a  target 
possessing  .11,  12,  ...»  or  23  tags  (no  target,  possessed  a  number  of  fags  out¬ 
side  this  range).  The  mean  number  of  tags  was  15.7,  s.d.  2.7*  Splitting  into 
the  various  tyres  of  tags  yields  Figure  9,  showing  that  occupation,  hobby  and 
organization  tags  are  all  similarly  distributed,  with  means  of  2.5  to  3,  s.d. 
about  3,  while  the  mean  number  of  location  tags  was  U.7,  s.d.  2.2. 


V/e  did  not  possess  directly  comparable  data  about  choices;  collection  of 
such  data  would  have  presented  enormous  complexity  nr.d  was  not  attempted. '  In¬ 
stead,  we  chose  to  deduce  choice  data  by  using  the  reasons  informants  gave  for 
each  choice.  Whenever  a  choice  achieved  a  direct ,  associated,  or  intervening 
match  with  the  target  it  was  chosen  for  (possibly  several  targets),  that  piece 
of  target  data  was  added  to  the  list  of  tags  for  that  choice. 

The  number  of  choice  tags* (again,  only  distinct  ones  are  counted)  is  much 
fewer  than  for  targets.  Figure  10  shows  the  probability  of  any  choice  having 
0,  1,...,12  tags.  The  mean  number  is  1.6,  s.d.  1.2.  Split  into  categories 
(Figure  11),  the  dominance  of  location  and  occupation  tags  is  clear  (mean  num¬ 
bers  of  0.75,  0.62,  compared  with  at  most  O.lU  for  all  other  tag  typ'er. ). 


We  can  now  test  the  simple  hypothesis  that,  for  any  given  target,  an  in¬ 
formant  will  choose  the  choice  that  has  the  largest  number  of  matching  tags 
with  that  target.  (This  procedure  is  of  course  biased  by  the  way  we  obtained 
the  choice  tags:  the  correct  choice,  for  a  given  target,  almost,  invariably 
possesses  some  tags  in  common  with  that  target.  However,  we  will  allow  for 
this,  statistically  below.)  Vie  define  two  non-location  togs  to  match  only  if 
they  agree  completely;  in  other  words,  an  occupation  tag  of  "symphony  orchestra 
conductor"  do  r  not  natch  with  "symphony  orchestra  player."  Location  tags 


match  if  the  choice  and  target  location  tags  correspond  to  positions  in  the 
U.S.  less  than  some  cutoff  distance  apart.  These  cutoffs  were  taken  to  be  U h h 
km,  222  kin.  113  km,  etc.,  down  to  7  km,  and  a  final  cutoff  of  0  km  (corres¬ 
ponding  to  a  location  mutch  only  if  the  two  locations  are  identical). 


Thus,  for  each  distance 
we  can  nominate  the  'optimal 
the  most  tags  in  common  with 


cutoff,  and  for  each  informant-target  combination, 
choice  (s.)  as  being  the  choice(s)  which,  bar.  (have) 
the  target,  and  then  compare  the  optima?  vita  the 


ultra  1  choice . 


defined  two  wavs  to 


accuracy  o: 


which  wo  term  the  "easy  and  difficult 


c  ■  fircu 


a:  unity  whenever  tile  as  inn!  choice  is. 
otherwise;  *.:  •  difficult  score  us  l/(u 


among  the  optimal  choices, 

.  .  of  ortiral  clv  'ces)  if  tl 


.1  r  ro 
'  tu.il 


choice  .is.  amom  the  optimal  choices ,  ami  r.ero  o',  in 
easy  score  renuts  hew  often  the  actual  choice  was 


•rwis.e.  In  other  words,  the 
correctly  (but  not  necesauri 1 
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uniquely)  predicted;  the  difficult  score  counts  how  often  we  would  be  correct 
if  we  chose  at  random  from  among  the  optimal  choices. 

The  results  of  this  are  shown  in  Table  9,  averaged  over  all  informant- 
target  combinations.  Two  things  are  immediately  obvious.  The  first  is  that 
maximal  accuracy  is  obtained  when  the  location  matches  exactly,  although  there 
is  little  degradation  if  two  location  tags  are  up  to  28  km  apart.  Henceforth, 
we  shall  treat  location  tags  like  all  others,  and  require  an  exact  match.  The 
second  is  the  high  rate  of  accuracy:  the  actual  choice  in  among  the  optimal 
choices  89%  of  the  time,  and  predicted  60%  of  the  time  if  the  choice  is  selected 
at  random  from  among  the  optimal  choices. 

The  high  success  rate,  combined  with  the  simplistic  approach,  suggest  that 
we  might  improve  the  uecuracy  still  further  if  we  weighted  the  tags  in  some 
fashion  before  counting  the  matches.  We  examined  eight  different  weighting 
combinations,  shown  in  Table  10.  Note  that  n£  weighting  achieves  the  accuracy 
of  the  simplest  counting  scheme;  also,  the  overall  importance  of  location  and 
occupation  is  confirmed  by  scheme  3's  scores  of  0.52,  0.82.  We  are  forced  to 
conclude  that  if  the  model  has  relevance  to  the  decision  process,  then  all  tags 
should  be  counted,  independent  of  their  directness  or  lack  thereof.  In  linear 
programming  terms,  then,  all  tags  have  an  equal  utility. 

Informants  occasionally  asked  about  schooling.  In  our  coding,  schools 
were  not  allocated  a  precise  location  (i.e.,  they  were  not  given  Cartesian 
coordinates)  but  were  recorded. by  state  and  an  identifying  arbitrary  code. 
Including  schools  as  a  separate  tag  might  improve  accuracy.  However,  if  the 
school's  state  is  used  as  a  tag  (so  that  all  schools  in  the  same  state  are 
identical  for  predictive  purposes)  this  weakens  the  scores  to  0.59.  0.87. 

Using  the  school's  unique  code  as  a  tag  improves  the  accuracy,  but  only  by  1?, 
to  0.60,  0.90  for  difficult  and  easy  scores  respectively.  Hence  inclusion  of 
schools  is  of  no  real  help  in  predicting  choice. 

The  difficulty  with  interpreting  these  results  stems  almost  entirely  from 
the  biased  way  the  choice  tags  were  obtained.  It  seems  intuitively  obvious 
that  if  one  obtains  some  choice  tags  from  the  target  for  which  that  choice  was 
made,  th.sn  that  choice  is  likely  to  be  the  one  with  the  largest  number  of  tags 
matching  with  that  target.  Clearly,  we  need  to  estimate  how  likely  it  is 
that  we  achieve  the  levels  of  accuracy  observed  in  our  data. 

To  calculate  this  we  need  three  sets  of  probabilities.  The  first,  ar, 
r  c  0,  1,  2,. ..,5,  are  the  probabilities  that  the  actual  choice  has  r  tags 
matching  the  target.  These  probabilities,  from  the  data,  are  shown  in 
Figure  12.  The  mean  number  of  matchings  tags  is  1.56,  s.d.  0 . SU .  The  other 
sets  are  $  ,  n  =  11,  12,..., 23.  the  probabilities  that  a  target  has  n  tags 
(previously  in  Figure  8)  and  ym,  m  =  0,  1,...,12,  the  probabilities  that  a 
choice  has  m  tags  (given  previously  in  Figure  9). 

We  assume  all  tags  are  of  a  similar  type  (retaining  the  different  cat¬ 
egories  would  involve  awkward  partitions  of  integers,  without  adding  signif¬ 
icantly  to  the  results).  Let  there  bo  JJ  tags  in  total  (there  are  Vl  different 
target  tags  in  all:  126  location,  8U  occupation,  60  hobbies,  109  organisations , 
37  ages,  2  sexes,  and  13  religion  tags).  We  might  take  N  as  1*51,  or  perhaps 
only  (126  +  8H),  depending  on  our  interpretation  of  the  number  of  tags. 
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Now  the  probability  that  a  random  choice  tag  matches  a  random  target  tag 
is  clearly  l/N.  If  the  target  has  n  different  tags,  and  the  choice  hs  m  dif¬ 
ferent  tags,  then  the  probability  of  exactly  t  matching  tags  in 


=  C 
ra  t 


(2) 

N 


(1  -  2) 
v  ir 


m-t 


by  the  binomial  theorem  (the  n/H  factor  derives  from  n  chances  of  l/N,  of 
course).  Hence  the  probability  of  less  than  r  matches,  P  ,  is  given  by 


r-1 

Pr  *  V 

s=0 


Thus,  if  the  correct  choice  has  r  matches,  the  probability  that  another  random 
choice  achieves  less  than  r  matches,  and  is  not  optimal,  is  P  ,  and  the  proba¬ 
bility  that  a  random  choice  achieves  the  same  number  of  matches  is  qr.  The 
probability  that  another  choice  achieves  more  tag  matches  than  the  correct 
choice  is  (l-Pr+1). 

These  probabilities,  so  far, are  conditional  upon  the  values  of  r,  m,  and 
n.  Summing  over  r,  m,  and  n,  and  multiplying  by  ar  6r  yr,  yields  the  proba¬ 
bility  P  that  a  choice  has  fewer  tag  matches  than  the  correct  choice,  and  Q 
that  a  choice  has  the  same  number  of  tag  matches  (and,  therefore,  R  =  1-(P+Q) 
that  a  choice  has  more  tag  matehes). 

The  expected  value  for  the  easy  score  E  is  then  given,  since  there  are  on 
average  Ul  different  choices,  by 

•  t(E)  =  (P  +  Q)40  =  C(E2) 

Which  is  simply  the  probability  that  all  the  other  choices  score  less  than  the 
correct  choice.  The  variance  is  then  given  by 

o|  =  (P  +  Q)40  -  I(P  +  Q)40]2. 

The  difficult  score  is  slightly  more  awkward  to  evaluate  numerically. 

The  probability  that  any  other  choice  achieves  the  same  score  as  the  correct 
choice  is 


ua  S  c  QaP40"u, 
a  i*o  Q 


giving  expectancies  for  the  difficult  score  D  as 


«*0  i*0 

f  (D)  =  Z  v  /a,  t  (D2)  =  I  u  /a2 
a=l  a  a=l  a 


and  variance 


c2  =  e (D2)  -  (e(D))2. 
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V.'e  can  now  compare  the  observed  scores  with  those  expected.  With  N  -  1*51 
different  tags, 

P  =  0.922,  Q  =  .  07 
e(K)  =  0.859,  oE  -  0.3>tP 

e(n)  =  0.271,  oD  «=  0.2014 

Over  2,500  cases,  the  observed  meun  easy  score  was  0.89.  Dividing  by 
(?,h00)'J  -  50,  we  find  that  0.89  is  some  h  standard  deviations  above  expected. 
Similarly,  the  observed  difficult  score  of  0.60  is  80  standard  deviations 
above  expected.  Th >s  both  observed  scores  are  significantly**  higher  than 
expected  by  chance,  although  it  should  be  borne  in  mind  that  0.86  in  the  ex¬ 
pected  easy  score  (i.e.,  the  system  is  biased  toward  a  high  score). 

Adjusting  the  effective  number  of  tags  only  makes  this  conclusion  firmer. 
Restricting  attention  to  location  and  occupation,  which,  from  Table  10, 
achieves  difficult  and  easy  scores  of  0.52,  0.82  respectively,  yields  expected 
scorer,  of  0.15  (s.d.  0.15),  0.65  (s.d.  0jt8)  respectively.  Again,  the  ob¬ 
served  scores  are  significantly**  high. 

It  should  be  noted  that  we  deliberately  did  not  restrict  the  target's  tags 
to  what  each  informant  knew  about  the  target  (i.e.,  we  compared  choice  location 
tags  with  a  target's  places  of. travel,  whether  or  not  the  informant  had  asked 
about  the  target's  travel).  This  was  to  permit  the  other  choices  more  chance 
of  matching  tags  with  the  target.  If  we  cto  restrict  the  target's  tags  to  what 
each  informant  asked  about,  the  mean  difficult  and  easy  scores  rise  still  fur¬ 
ther  to  O.78,  O.Qh  respectively.  It  is  clear  that  this  is  too  biased  to  be 
regarded  as  a  fuir  test  of  the  tag  concept. 


Tags:  A  More  Detailed  Examination 


The  tag  concept  discussed  in  the  previous  section  is  a  simplistic  one.  A 
detailed  ethnographic  study  ofhov  informants  make  a  selection  between  choices 
of  apparently  equal  utility  would  be  very  valuable.  However,  the  89!?  success 
rate  of  the  straightforward  counting  procedure,  although  biased,  obviously  ac¬ 
counts  for  a  large  amount  of  the  decision  process. 


The  number  of  tags  differs  strongly  between  targets;  and  the  total  number 
of  choice  tags  differs  between  informants.  We  examined  whether  characteristics 
of  either  targets  or  informants  could  account  for  this  variation.  Multiple 
regression  of  the  number  of  occupation  tags  for  a  given  target  shoved  that  1*6? 
of  the  variance  could  be  accounted  for  by  a  linear  combination  of  socioeconomic 
variables  (as  usual,  ve  suppress  all  cases  where  .less  than  1<0;T.  of  the  variance 
is  explained).  The  largest  contributors  are  targets'  age  and  occupation  level: 
the  higher  the  target's  age,  and  the  lower  his  occupation  level,  the  more  occu¬ 
pation  tags  that  target  possesses.  This  is  plausible:  too  low  an  occupation 
.level  forces  informants  to  search  for  other  occupations  related  to  that  target. 
The  only  other  target  characteristic  which  accounted  for  any  variance  in  the 
data  (apart  from  trivial  connections  like  number  of  hobby  tags  with  number  of 
hobbies)  was  that,  targets  who  travel  lave  significantly*  more  hobby  tags  than 
those  who  do  not  travel.  Wo  cannot  account  for  this;  it  may  simply  reflect  a 
bins  in  the  construction  of  the  target  data. 
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Similarly,  informant  data  accounts  for  little  variation  in  the  total  num¬ 
ber  of  tape  (not  necessarily  distinct)  which  their  choices  possessed.  Each 
informant  had  u  total  of  75  tags  on  average,  of  which  29  were  location  and  25 
occupation.  The  only  significant  result  is  that  female  informants  have  more* 
occupation  tags  than  male  informant s , ty  ?8  to  21.  Thus,  little  about  targets 
or  informants  accounts  for  variation  in  tag  density. 


Another,  more  subtle,  bias  in  our  use  of  tags  is  the  degree  of  utility  of 
each  tag.  Because  of  the  manner  in  'which  choice  tags  were  deduced,  the  majority 
of  them  have,  in  some  undefined  sense,  n  high  degree  of  utility  for  that  infor¬ 
mant.  Thus,  with  hindsight,  equal  tag  weighting  is  likely  to  yield  the  most 
accurate  results. 


Clearly,  not  all  tags  are  really  of  equal  utility:  it  seems  plausible 
that  a  choice  currently  living  in  Chicago  is  more  likely  to  be  chosen  for  a 
Chicago  target  than,  say,  a  choice  whose  father  travels  to  Chicago.  Given  the 
limitations  of  our  experiment,  however,  we  could  not  test  for  this. 

To  extend  the  investigation,  we  conducted  a  follow-up  interview  with  in¬ 
formant  15.  He  provided  two  new  sets  of  data.  The  first  was  a  count  of  the 
number  of  connections  or  tags  between  each  of  his  33  choices  and  each  of  the 
50  targets,  now  given  all  target  information,  rather  than  the  limited  informa¬ 
tion  he  requested  during  the  original  experiment.  Then,  armed  with  all  the 
information,  he  told  us  which  choice  he  would  now  make  for  each  target. 

This  mini-experiment  was  slightly  flawed  for  two  reasons.  First,  in 
order  to  reduce  the  many  hours  of  the  follow-up,  we  had  collated  all  locations 
and  occupations  relevant  to  each  target.  As  a  result,  if  the  target  lived  in 
a  small  town  but  was  attending  college  in  a  neighboring  big  city,  both  the 
big  city  ("the  town  is  near  X")  and  the  college  ("attends  X  college"")  were 
available  as  location  tags.  This  doubling-up  of  essentially  the  same  informa¬ 
tion  made  interpretation  of  the  data  somewhat  difficult.  Second,  the  informant 
ignored  the  myriad  of  possible  intervening  choices,  as  ve  had  requested.  This 
automatically  removed  such  reasons  as  "I  choose  V,  because  she  knows  someone  at 
Z  oil  company,"  which  had  been  used  in  the  original  experiment.  A  more  precise, 
informant-defined  interview  would  be  very  valuable. 

Informant  15  generated  many  tags  between  choices  and  targets.  On  average, 
between  any  choice  and  tuiy  target,  there  were  0.27  location  matches,  0.05  occu¬ 
pation  matches,  0.21  hobby  matches,  and  0.06  organisation  matches,  or  0. 5f> 
matches  in  all.  Corresponding  s.d.s  are  0.3i  0.06,  0.21,  0.06,  and  O.ltl, 
respectively . 

In  the  original  experiment,  the  difficult  and  easy  scores  for  the  tag  con¬ 
cept  for  informant  15  had  been  0.50,  0.86  respectively .  Repeating  the  calcula¬ 
tion  based  on  the  more  complete  tag  information  reduces  the  accuracy  noticeably 
to  Oji’t,  0.25  (although  this  is  now  unbiased,  of  course,  so  interpretation  of 
the  scorer,  is  somewhat  altered).  His  final  choices  contained  17  alterations 
from  the  original  set  of  choices;  on  three  occasions  he  would  now  prefer  to 
make  a  choice  outside  his  initial  3c.  Rather turprisingly ,  the  tag  scores 
based  on  his  final  choices  decreased  slightly  to  O.fil*,  0.1*2. 
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W(>  examined  the  cases  wl.  ro  simple  tag-counting  yielded  the  wrong  choice. 
Almost  invariably  it  was  n  matter  of  "how  strong"  a  tag  was:  a  choice  living 
in  the  target's  location  (1  tug)  being  preferred,  for  example,  to  a  choice 
who  was  born  in  that  location  and  whose  family  still  lives  there  (T  tags). 

Thus  the  original  teg  experiment  hud,  ns  we  suspected,  automatically  realed 
the  utility  of  most  tags,  and  chosen  the  most  useful  ones.  The  follow-up 
interview,  however,  did  not  contain  any  utilities,  and  this  probably  accounts 
for  the  reduction  in  accuracy. 

This  suggests  a  variety  of  in lormant-de fined  experiments  both  to  find  out 
what  weighting  of  tags  and  tag  types  is  necessary  to  yield  accurate  prediction 
of  choices,  and  to  find  what  other  qualities  of  informants  and/or  targets  are 
important  in  uetermining  why  some  targets  have  stronger  ties  with  some  infor¬ 
mants  than  with  others. 


•  .  •»  • 
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In  Kf!W,  2'^  of  tlu*  turrets  were  housewi ves ;  the 
by  informants  to  these  targi  Is  led  up  to  reduce 


uniformity  of  response 
the  number  of  house¬ 


wives  for  th-  present  experiment.  With  hindsight, 


three  housewives  out 


of  ^0  tar pets  are  too  few  to  produce  reliable  statistics. 


"Cities"  are  defined  (except  for  Morgantown}  as  places  we  felt  informant" 
anywhere  in  the  IJ.S.  would  recognise. 


The  same  conclusion  holds  if  attention  is  restricted  to  those  portions  of 
a  string  up  to  when  the  most  useful  question  was  asked. 


The  conditional  probability  enables  rarely-asked  questions  to  be  added  tc 
a  group. 


This  is  rather  unusual  for  the  social  sciences,  rerhaps.  However,  u 
scatter  diagram  of  a  regression  accounting  for  l\,’.  of  the  variance  shows 
very  little  in  the  way  of  a  signal;  hence  our  suppression  of  low  variance. 

We  examined  only  the  2l>  most  frequently  asked  questions,  which  account 
for  95*  of  all  questions  ever  asked. 


Of  course,  ve  had  a  great  deal  of  data  about  the  choices  in  the  one  or  two 
sentence  explanations  given  by  informants.  However,  these  data  were  not 
collected  with  the  idea  of  systematic  comparison,  and  ve  siraplj  did  not 
know  how  to  code  the  data  for  such  comparison.  It  is  obvious  how  to  col¬ 
lect  comparable  data  about  choices;  but  this  would  have  increased  the  time 
required  for  interviews  by  so  much  that,  ve  were  forced  to  abandon  this 
part  of  our  original  design. 
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Mean 

Standard 

Deviation 

Median 

Mode 

Kange 

Age  (years) 

44.8 

14.4 

45.2 

45 

50 

Occupation  level 
(Duncan  scale) 

45.5 

28.8 

44.5 

19 

93 

Income  ($1000/year) 

19.9 

11.8 

16.1 

14 

65 

Distance  from  Morgantown  (km) 

575 

668 

231 

0 

2470 

Number  of  hobbies 

2.6 

0.8 

2.5 

2 

3 

Number  of  organizations 

2.5 

0.8 

2.2 

3 

4 

Number  of  children 

3.2 

1.4 

3.2 

4 

6 

TABLE  1.  Some  socioeconomic  data  about  the  50  targets;  income  was  not  provided 
a  priori. 


Mean 

Standard 

Deviation 

Median 

Mode 

Range 

Age  (years) 

41.9 

15. 1 

42.5 

30 

59 

Occupation  level 
(Duncan  scale) 

42.9 

24.5 

43.5 

19 

75 

Other  occupation  level 

34.0 

24.6 

19.2 

19 

61 

Spouse ' s  occupation 
level 

49. 8 

26.4 

51.0 

19 

94 

Previous  occupation 
level  (first  of  several) 

43.9 

23.0 

44.3 

39 

82 

Income  ($1000/year) 

17.4 

8.6 

16.2 

13 

25 

Number  of  hobbies 

2.8 

1.0 

2.8 

3 

4 

Number  of  organizations 

2.4 

1.3 

2.4 

3 

4 

Number  of  children 

2.6 

1.6 

2.2 

2 

7 

TABLE  2.  Some  socioeconomic  data  about  the  50  informants. 


NAME : 

LOCATION: 

R1HTHP1.ACK: 

OCCUi  ATI  ON: 

1UCI:  OK  EMPLOYMENT  : 
INCOME: 


Hrti.Jiui.in  S.  Clay 

Chari  e:  I  or. ,  V.<  .  i  Virginia  (South  Chav  I  <  :  t  on  ) 
Columbus,  Ohio  (duly  1.’) 

1'aperr  i  1 1  laborer 

Vi  lla  In  Kapur  Co.  (worked  1!.>  re  four  years) 

$i;>,ooo 


AGE: 

HACK: 

RELIGION 
EDUCATION 
HOBBI KS: 
ORGANIZATIONS: 

SERVICE  RECORD: 


32 

Wh  i  t  o 

Catholic  (not  an  active  member) 

Graduated  from  Hiehwood  High  School 
Pistol-shooting  (competition  ) ,  hunting 

National  Rifle  Association,  Committee  for  Handgun  Control 
United  Haperworkors  International  Union 

Served  in  the  Navy  for  four  years;  stationed  in  San  Diego 
Calif. 


MARITAL  STATUS: 
SPOUSE'S  OCCUPATION: 
CHILDREN: 

PAM  I LY : 

TRAVEL: 

PREVIOUS  OCCUPATION: 


Divorced 

Was  a  housewife 

Two  sons,  ages  10  and  10  (both  live  with  spouse  in 
Frostburg ,  Mary 1  and ) 

Came  from  small  family 

Kentucky,  North  Carolina  (Smokey  Mountains) 

Gas  station  attendant  in  Columbus,  Ohio. 


Table  3.  A  typical  target  dossier,  alter  all  50  informants 
have  asked  questions.  Asterisks  denote  information 
allocated  at  the  beginning,  of  the  experiment  . 


TABLE  k 

(See  legend  at  end  of  table.) 


1)  What  is  the  target's  name V 

2 )  What  is  the  target's  age? 

3)  What  is  the  target's  occupation? 

fc)  Where  does  the  target  work  (i.e.,  name  of  company)? 

5)  Does  the  target  have  any  other  occupation? 

6)  What  is  the  occupation  of  the  target's  spouse? 

7)  Does  the  target  have  any  previous  occupation? 

8)  What  is  the  target's  income? 

9)  What  is  the  target's  religion? 

10)  Is  the  target  active  in  religion? 

11)  Where  does  the  target  attend  church? 

12)  What  is  the  race  of  the  target? 

13)  What  is  the  target's  ethnic  background) 
lM  Where  does  the  target  live? 

15)  Is  the  target  a  resident  or  a  commuter? 

16)  Where  is  the  target's  location  near?  That  is,  what  is  the  closest  well-known 
urban  area? 

17)  Has  the  target  ever  liced  anywhere  else? 

18)  Kow  long  ago  did  the  target  move? 

19)  How  long  has  the  target  lived  in  the  present  location? 

20)  Where  was  the  target  born  and  raised? 

21)  What  are  the  target's  hobbies? 

22)  How  far  has  the  target  gone  in  school? 

23)  Where  did/does  the  target  go  to  school? 

2h)  What  is/was  the  target's  field  of  study  in  school? 


25)  Where  did/docs  the  target' s  spouse  go  to  school? 

26)  What  organizations  does  the  target  belong  to? 

27)  Is  the  target  active  in  his/her  organizations? 

20)  What  organizations  does  the  target's  spouse  belong  to? 

29)  Is  the  target  male  or  female? 

30)  What  is  the  target's  marital  status? 

31)  How  long  has  the  target  been  married? 

32)  Is  the  target's  . pouse  alive? 

33)  How  old  is  the  target's  spouse? 

3*0  Has  the  target  been  divorced?  [=30] 

35)  Does  the  target  have  a  family  (mother,  father,  etc.)? 

36)  Where  does  the  target's  family  live? 

37)  Does  the  target  travel? 

38)  Where  does  the  target  travel? 

39)  Does  the  target  have  children? 

40)  How  many  children  does  the  target  have? 

41)  What  are  the  ages  of  the  target's  children? 

42)  Where  do/did  the  target's  children  go  to  school? 

1*3)  Has  the  target  served  in  the  armed  services? 

1*4)  What  branch  of  the  armed  forces  did  the  target  serve  in? 

45)  What  is  the  name  of  the  target's  spouse? 

46)  Does  the  target  play  an  instrument?  [=21] 

47)  Does  the  target's  spouse  go  to  school?  [=6] 

48)  How  long  has  the  target  worked  as  his/her  present  occupation? 

49)  What  is  the  rank  of  the  target  in  college? 

50)  Does  the  target  have  a  boy /girlfriend? 

51)  Does  the  target  go  to  school? 

52)  Exactly  where  in  the  city  does  the  target  live  (i.e.,  neighborhood) 


93)  Where  do  the  target's  children  live? 


)  Whnt  are  the  occu, ation3  of  the  target's  children? 

5 9)  What  is  the  place  of  employment  of  the  target's  children? 

96)  How  many  boys  and  girls  does  the  target  have? 

97)  Is  the  target  retired? 

58)  Has  the  target  had  any  previous  occupation?  [=7] 

59)  Is  the  target's  spouse  a  native  of  the  state  where  they  live? 

60)  Is  the  target's  business  large  or  small? 

61)  What  is  the  age  of  the  target's  parents? 

62)  Has  the  target's  spouse  lived  anywhere  else?  [=68] 

63)  Where  does  the  target's  spouse  work? 

6U )  Whut  is  the  divorced  spouse's  location? 

65)  What  is  the  medical  specialty  of  the  target? 

66)  Does  the  target  have  any  grandchildren? 

67)  Has  the  target  lived  anywhere  else?  [=17] 

68)  Where  was  the  target's  spouse  born  and  raised? 

69)  What  is  the  exact  birthdate  of  the  target? 

70)  Wnat  is  the  target's  spouse's  age?  [=33] 

71)  How  long  has  the  target  been  married?  [=31] 

72)  How  far  has  the  target's  spouse  gone  in  school? 

73)  What  is  the  income  of  the  target?  [=8] 

7*0  Does  the  target  live  in  a  house,  apartment,  or  condominium? 
79)  What  is  the  target's  physical  condition? 

76)  What  i s/are  the  occupations  of  the  target's  parents? 

77)  What  is  the  SES  of  the  target's  parents? 

78)  Does  the  target  have  n  boy/girl  friend?  [=90] 

79)  Whnt.  is  the  occupation  of  the  target's  boy/girl  friend? 


80)  Where  duos  the  turret’s  boy/girl  friend  work? 

81)  What  it;  the  previous  location,  if  any,  of  the  target's  spouse?  (=68) 

82)  How  ninny  year:;  has  the  target's  spouse  worked  at  the  present  occupation? 

83)  Whut  is/was  the  target's  children's  field  of  study? 

84)  Is  the  target's  spouse  active  in  religion? 

85)  When  did  the  target  graduate  or  stop  going  to  school? 

86)  Where  do/did  the  target's  parents  work? 

87)  Is  the  target's  family  large  or  small  (i.e.  ,  mother,  father,  brothers,  etc.)? 

88)  What  is  the  target's  SES? 

89)  How  many  people  work  for  the  target? 

90)  What  are  the  occupations  of  the  members  of  the  target's  family,  excluding  the 
target's  mother  and  father? 

91)  Where  was  the  target  stationed  in  the  armed  service? 

9?)  How  many  years  was  the  target  in  the  service? 

93)  How  long  ago  was  the  target  divorced? 

9*0  Has  the  target  ever  published  anything? 

95)  Who  supports  the  target's  child/children? 

96)  Is  the  target's  child/children  enrolled  in  a  day  care  center? 

97 )  Does  the  target  live  alone? 

98)  Is  the  target  paying  alimony? 

99)  Other  questions  are  lumped  in  this  category  (six  one-offs). 

Table  4.  All  questions  ever  asked  by  all  informants. 

There  is  no  connotation  as  to  order.  Asterisks  imply 
questions  asked  in  pretesting  only.  Brackets  imply 
question  equivalent  to  an  earlier  one. 


Group 

Contents 

Comments 

1 

3 

occupation 

2 

lU 

location 

3 

18 

how  long  ago  did  target  move?  (rarely  asked) 

It 

77 

what  is  the  socioeconomic  status  of  the 
target's  parents?  (very  rarely  asked) 

5 

82 

how  many  years  has  the  target's  spouse  worked 
at  the  present  occupation?  (very  rarely  asked 

6 

2,21,26,90,97 

age,  hobbies,  organizations  and  rare  question 

7 

29,30,39,1*8,98 

sex,  marital  status,  children,  and  rare 
questions 

8 

all  other  questions 

TABLE  5 


Groups  of  questions  found  by  clustering. 


Tonic 


Splitting  of  Informants 


No. 

of 

times 

quest i on 

1  asked 

No. 

of 

times 

question 

38  asked 

No. 

of 

times 

question 

2  most  useful 

No. 

of 

times 

question 

3S  most  useful 

No. 

of 

times 

question 

6  second  most  useful 

No. 

of 

times 

question 

16  second  most  useful 

No. 

of 

times 

question 

3  a:  all  useful 

No. 

of 

times 

question 

38  at  all  useful 

No. 

of 

times 

question 

1  was  not  useful 

No. 

of 

times 

question 

h  was  not  useful 

No. 

of 

times 

question 

38  was  not  useful 

No. 

and 

of  times  question  1 U  was  most  useful 
asked  last  in  a  string 

No. 

and 

0 

of  times  question  38  was  most  useful 
asked  last  in  a  string 

have  children  ( 1 1 )  vs.  no  children  (2) 

travel  (1.9)  vs.  non-travel  (9) 

males  (0.09)  vs.  females  (0.8) 

travel  (Oji)  vs.  non-travel  (2.3) 

urban  background  (.06)  vs.  rural  backg.  (0.06) 

urban  background  (l)  vs.  rural  backg.  (0.02) 

males  (22)  vs.  females  (32) 

travel  (0.9)  vs.  non-travel  (1<) 

have  children  (9*2)  vs.  no  children  (1.5) 

have  children  (^.8)  vs.  no  children  (2.5) 

travel  (l)  vs.  non-travel  (5) 

males  (0.6)  vs.  females  (3.3) 

travel  (0.3)  vs.  non-travel  (1.7) 


TABLE  6 


The  mean  differences  between  question  usage,  significant  at  the  l£  level, 
between  informants.  5#  significances  have  been  removed  for  reasons  given 
in  the  text. 


!Io.  of  times  question  29  asked 

No.  of  times  question  63  asked 

No.  of  times  question  3  was  most  useful 

No.  of  times  question  6  was  most  useful 

No.  of  times  question  3  was  most  useful 

No.  of  times  question  1^  was  most  useful 

No.  of  times  question  21  was  second  most- useful 

No.  of  times  question  6  was  useful 

No.  of  times  question  6  was  not  useful 

No.  of  times  question  Uo  was  not  useful 


males  (17)  vs.  females  ( 19 ) 
males  (0.15)  vs.  females  (?) 
males  ( 1 8 . 5 )  vs.  females  (13.5) 
males  (0.03)  vs.  females  (1.5) 

urban  (13.5)  vs.  rural  (l8.S) 

urban  (?0.5)  vs.  rural  (l?) 

urban  background  (l.l*)  vs.  rural  backg.  (0.1) 

have  children  (2.3)  vs.  no  children  (0.2) 
have  children  (l*, 5)  vs.  no  children  (0.2) 
travel  (1.2)  vs.  non-travel  (0.1*) 


TABLE  7. 


The  mean  differences  between  question  usage,  significant  at 
the  l£  level,  between  targets.  Obviously  significant  splits 
(e.g.,  question  16  split  by  urban  and  rural  targets)  are 
suppressed. 
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Table  8,  Target  characteristics  related  to  question 
SJp.n  is  direction  of  rolnt lonshtp. 


Cutoff 

HHU  km 

222  km 

111  km 

Difficult  score 

0.  31 

0.37 

0.1*8 

Easy  score 

0.53 

0.63 

0.77 

TABLE  9.  Average 

accuracy 

scores 

obtained 

'optimal'  choice. 


56  km 

28  km 

lit  km 

7  km 

0  km 

0.5l* 

0.57 

0.59 

0.59 

0.80 

0.85 

0.87 

0.89 

0.89 

0.8  9 

by  predicting  that  informant  will  make  the 


Weighting 

Difficult  Score 

Easy  Score 

1. 

Only  location  tags  counted 

0.3l» 

0.59 

2. 

Only  occupation  tags  counted 

0.33  . 

0.1*9 

3. 

Only  location  and  occupation 
tags  counted 

0.52 

0.82 

1 ♦  . 

Only  location  tags  counted;  tags 
which  are  direct  hits  are  given 
double  weighting 

0.32 

0.1*6 

5. 

As  It,  but  for  occupation 

0.33 

0.1*9 

6. 

As  1*,  but  for  location  and 
occupation 

0.1*7 

0.70 

7. 

Only  non-(location  or  occupation) 
tags  counted 

0.19 

0.31 

8. 

All  tags  counted;  location  and 
occupation  weighted  as  in  It. 

0.55 

0.77 

TABLE  10.  Average  easy  and  difficult,  scores  for  various  weightings  of  tags. 


Figure 


Question  usage,  by  questions 
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QUESTIONS 


Figure  3a.  Fractional  amount  of  questions  deemed 
"most  useful"  by  informants. 
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QUESTIONS 


Figure  3b.  As  in  Fig.  3a,  but  for  questions  deemed 
"at  all  useful." 
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QUESTION  NUMBER 


FIRST  QUESTION  THIRO  QUESTION - 

SECOND  QUESTION  FOURTH  QUESTION 


Figure  5.  Probability  of  various  questions  being  asked 
first,  second,  third  or  fourth  in  a  string. 
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Figure  6a.  Most  likely  sequence  of  questions  for  strings  beginning,  with 
question  14.  Figures  in  parentheses  are  percentage  probabilities  of 
questions  being  asked,  given  that  their  predecessors  were  asked 
and  were  either  useful  or  uon-use.fu1  (a  useful  response  branches 
upwards;  a  non-useful  downwards).  Sequences  end  when  all  potential 
strings  are  exhausted,  or  after  f>  questions. 


Figure  7.  Chains  of  causality  for  questions.  The  lower  half  should  ideally  be  inverted 
and  proceed  upward  from  29,  but  for  clarity  it  moves  downwards.  Question  j  is  _ 
related  to  i  at  level  p  if  the  probability  that  i  immediately  precedes  j,  given  j 
•  was  asked,  multiplied  by  the  probability  that  J  ia  not  asked,  given  that  i  was  not, 
exceeds  p.  Most  of  the  0  -  0.75  links  are  very  weak  (about  0.02). 
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Figure  9.  As  in  Figure  8,  but  distributed  by  different 
category  of  tag. 
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Figure  11.  As  for  Figure  10,  but  distributed  by 
different  category  of  tag. 


